home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Libris Britannia 4
/
science library(b).zip
/
science library(b)
/
COMMUNIC
/
BULLETIN
/
0157.ZIP
/
ARC440.DOC
< prev
next >
Wrap
Text File
|
1985-10-30
|
29KB
|
648 lines
ARC
File Archive Utility
(C) COPYRIGHT 1985 by System Enhancement Associates; ALL RIGHTS RESERVED
This file describes the ARC file utility, version 4.4, which was
created by System Enhancement Associates on 25 October 1985.
ARC is the copyrighted property of System Enhancement Associates. You
are granted a limited license to use ARC, and to copy it and
distribute it, provided that the following conditions are met:
1) No fee may be charged for such copying and distribution.
2) ARC may ONLY be distributed in its original, unmodified state.
Any voluntary contributions for the use of this program will be
appreciated, and should be sent to:
System Enhancement Associates
21 New Street
Wayne, NJ 07470
If you are using ARC in a commercial environment, then the
contribution is not voluntary.
A word about user supported software:
The user supported software concept (usually referred to as
"freeware") is an attempt to provide software at low cost. The cost
of offering a new product by conventional channels is staggering, and
hence dissuades many independant authors and small companies from
developing and promoting their ideas. User supported software is an
attempt to develop a new marketing channel, where products can be
introduced at low cost.
If user supported software works, then everyone will benefit. The
user will benefit by receiving quality products at low cost, and by
being able to "test drive" software thoroughly before purchasing it.
The author benefits by being able to enter the commercial software
arena without first needing large sources of venture capital.
But it can only work with your support. We're not just talking about
ARC here, but about all user supported software. If you find that you
are still using a program after a couple of weeks, then pretty
obviously it is worth something to you, and you should send in a
contribution.
And now, back to ARC:
ARC is used to create and maintain file archives. An archive is a
group of files collected together into one file in such a way that the
individual files may be recovered intact.
ARC is different from other archive and library utilities in that it
automatically compresses the files being archived, so that the
resulting archive takes up a minimum amount of space.
When ARC is used to add a file to an archive it analyzes the file to
determine which of four storage methods will result in the greatest
savings. These four methods are:
1) No compression; the file is stored intact.
2) Repeated-character compression; repeated sequences of the same
byte value are collapsed into a three-byte code sequence.
3) Huffman squeezing; the file is compressed into variable length bit
strings, similar to the method used by the SQ programs.
4) Lempel-Zev compression; the file is stored as a series of twelve
bit codes which represent character strings, and which are created
"on the fly".
Note that since one of the four methods involves no compression at
all, the resulting archive entry will never be larger than the
original file.
USING ARC
=========
ARC is invoked with a command of the following format:
ARC <x> <arcname> [<template> . . .]
Where:
<x> is an ARC command letter (see below), in either upper or lower
case.
<arcname> is the name of the archive to act on, with or without an
extension. If no extension is supplied, then ".ARC" is assumed.
The archive name may include path and drive specifiers.
<template> is one or more file name templates. The "wildcard"
characters "*" and "?" may be used. A file name template may only
include a path or drive specifier if you are adding a file to an
archive.
If ARC is invoked with no arguments (by typing "ARC", and pressing
"enter"), then a brief command summary is displayed.
ARC COMMANDS
============
Following is a brief summary of the available ARC commands:
a = add files to archive
u = update files in archive
m = move files to archive
d = delete files from archive
x,e = extract files from archive
r = run files from archive
p = copy files from archive to stdout
l = list files in archive
v = verbose listing of files in archive
t = test archive integrity
c = convert entry to new packing method
b = retain backup copy of archive
w = suppress warning messages
n = suppress notes and comments
These commands are explained in more detail below.
ADDING FILES
------------
Files are added to an archive using the "A" (Add), "U" (Update), or
"M" (Move) commands. Add always adds the file. Update differs from
Add in that the file is only added if it is not already in the
archive, or if it is newer that the corresponding entry in the
archive. Move differs from Add in that the source file is deleted
once it has been added to the archive.
For example, if you wish to add a file named "TEST.DAT" to an archive
named "MY.ARC", you would use a command of the form:
ARC a my test.dat
If you have an archive named "TEXT.ARC", and you wanted to add to it
all of your files with an extension of ".TXT" which have been created
or changed since they were last archived, then you would type:
ARC u text *.txt
If you wanted to move all files in your current directory into an
archive named "SUM.ARC", you could use a command of the form:
ARC m sum *.*
If you wanted to add all files with a ".C" extension, and all files
named "STUFF" to an archive named "JUNK.ARC", you could type:
ARC a junk *.c stuff.*
Archive entries are always maintained in alphabetic order. Archive
entries may not have duplicate names. If you add a file to an archive
that already contains a file by that name, then the existing entry in
the archive is replaced. Also, the archive itself and its backup will
not be added.
You may also add a file which is in a directory other than your
current directory. For example, it is perfectly legal to type:
ARC a junk c:\dustbin\stuff
The A, U, and M commands are the ONLY commands which allow you to give
a drive or path. Also, you cannot add two files with the same name.
In other words, if you have a file named "C:\DUSTBIN\STUFF.TXT" and
another file named "C:\BUCKET\STUFF.TXT", then typing:
arc a junk c:\dustbin\*.* c:\bucket\*.*
will not work.
DELETING FILES
--------------
Archive entries are deleted with the "D" (Delete) command. For
example, if you had an archive named "JUNK.ARC", and you wished to
delete all entries in it with a filename extension of ".C", you could
type:
ARC d junk *.c
EXTRACTING FILES
----------------
Archive entries are extracted with the "E" (Extract) and "X" (eXtract)
commands. For example, if you had an archive named "JUNK.ARC", and
you wanted all files in it with an extension of ".TXT" or ".DOC" to be
recreated on your disk, you could type:
ARC x junk *.txt *.doc
If you wanted to extract all of the files in an archive named
"JUNK.ARC", you could simply type:
ARC x junk
Whatever method of file compression was used in storing the files is
reversed, and uncompressed copies are created in the current
directory.
RUNNING FILES
-------------
Archive entries may be run without being extracted by use of the "R"
(Run) command. For example, if you had an archive named "JUNK.ARC"
which contained a file named "LEMON.COM", which you wished to run, you
could type:
ARC r junk lemon.com
You can run any file from an archive which has an extension of ".COM",
".EXE", or ".BAT". You cannot run interpretive BASIC programs from an
archive, nor can you give arguments to a program you are running from
an archive.
In practice, the file to be run is extracted, run, and then deleted.
All in all, this is a fairly useless command.
PRINTING FILES
--------------
Archive entries may be examined with the "P" (Print) command. This
works the same as the Extract command, except that the files are not
created on disk. Instead, the contents of the files are written to
standard output. For example, if you wanted to see the contents of
every ".TXT" file in an archive named "JUNK.ARC", but didn't want them
saved on disk, you could type:
ARC p junk *.txt
If you wanted them to be printed on your printer instead of on your
screen, you could type:
ARC p junk *.txt >prn
LISTING ARCHIVE ENTRIES
-----------------------
You can obtain a list of the contents of an archive by using the "L"
(List) command or the "V" (Verbose list) command. For example, to see
what is in an archive named "JUNK.ARC", you could type:
ARC l junk
If you are only interested in files with an extension of ".DOC", then
you could type:
ARC l junk *.doc
ARC prints a short listing of an archive's contents like this:
Name Length Date
============ ======== =========
ALPHA.TXT 6784 16 May 85
BRAVO.TXT 2432 16 May 85
COCO.TXT 256 16 May 85
"Name" is simply the name of the file.
"Length" is the unpacked file length. In other words, it is the
number of bytes of disk space which the file would take up if it were
extracted.
"Date" is the date on which the file had last been modified, as of the
time when it was added to the archive.
ARC prints a verbose listing of an archive's contents like this:
Name Length Stowage SF Size now Date Time CRC
========= ======== ======== ==== ======== ========= ====== ====
ALPHA.TXT 6784 Squeezed 35% 4413 16 May 85 11:53a 8708
BRAVO.TXT 2432 Squeezed 41% 1438 16 May 85 11:53a 5BD6
COCO.TXT 256 Packed 5% 244 16 May 85 11:53a 3AFB
"Name", "Length", and "Date" are the same as for a short listing.
"Stowage" is the compression method used. The following compression
methods are currently employed:
-- No compression.
Packed Runs of repeated byte values are collapsed.
Squeezed Huffman squeeze technique employed.
Crunched Lempel-Zev compression technique employed.
"SF" is the stowage factor. In other words, it is the percentage of
the file length which was saved by compression.
"Size now" is the number of bytes the file is occupying while in the
archive.
"Time" is the time of last modification, and is associated with the
date of last modification.
"CRC" is the CRC check value which has been stored with the file.
Another CRC value will be calculated when the file is extracted or
tested to ensure data integrity. There is no especially good reason
for displaying this value.
BACKUP RETENTION
----------------
When ARC adds or deletes archive entries it renames the original
archive to give it an extension of ".BAK", and then creates a new
archive with the desired changes. If you wish to retain this original
copy of the archive for backup purposes, then add the "B" (Backup)
command to your other commands.
For example, if you wanted to delete all entries with an extension of
".DOC" from an archive named "JUNK.ARC", but you wanted to keep a copy
around that still has them, then you could type:
ARC bd junk *.doc
or:
ARC db junk *.doc
MESSAGE SUPPRESION
------------------
ARC prints two types of messages, warnings and comments.
Warnings are messages about suspected error conditions, such as when a
file to be extracted already exists, or when an extracted file fails
the CRC error check. Warnings may be suppressed by use of the "W"
(Warn) command. You should use this command sparingly. In fact, you
should probably not use this command at all.
Comments (or notes) are informative messages, such as naming each file
as it is added to the archive. Comments and notes may be suppressed
by use of the "N" (Note) command.
For example, suppose you extracted all files with an extension of
".BAS" from an archive named "JUNK.ARC" Then, after making some
changes which you decide not to keep, you decide that you want to
extract them all again, but you don't want to be asked to confirm
every one. In this case, you could type:
ARC xw junk *.bas
Or, if you are going to add a hundred files with an extension of
".MSG" to an archive named "TRASH.ARC", and you don't want ARC to list
them as it adds them, you could type:
ARC an trash *.msg
Or, if you want to extract the entire contents of an archive named
"JUNK.ARC", and you don't want to hear anything, then type:
ARC xnw junk
TESTING AN ARCHIVE
------------------
The integrity of an archive may be tested by use of the "T" (Test)
command. This checks to make sure that all of the file headers are
properly placed, and that all of the files are in good shape.
This can be very useful for critical archives, where data integrity
must be assured. When an archive is tested, all of the entries in the
archive are unpacked (without saving them anywhere) so that a CRC
check value may be calculated and compared with the recorded CRC
value.
For example, if you just received an archive named "JUNK.ARC" over a
phone line, and you want to make sure that you received it properly,
you could type:
ARC t junk
It defeats the purpose of the T command to combine it with N or W.
CONVERTING AN ARCHIVE
---------------------
The "C" (Convert) command is used to convert an archive entry to take
advantage of newer compression techniques. For example, if you had an
archive named "JUNK.ARC", and you wanted to make sure that all files
with an extension of ".DOC" were encoded using the very latest
methods, you could type:
ARC c junk *.doc
Or if you wanted to convert every file in the archive, you could type:
ARC c junk
RAMDISK SUPPORT
---------------
If you have a ramdisk, or other high-speed storage, then you can speed
up ARC somewhat by telling it to put its temporary files on the
ramdisk. You do this by setting the ARCTEMP environment string with
the MS-DOS SET command. For example, if drive B: is your ramdisk,
then you would type:
set ARCTEMP=B:
Refer to the MS-DOS manual for more details about the SET command.
You need only set the ARCTEMP string once, and ARC will use it from
then on until you reboot or change its value.
VERSION NUMBERS
---------------
There seems to be some confusion about our version numbering scheme.
All of our version numbers are given as a number with two decimal
places.
The units indicate a major revision, such as adding a new packing
algorythm.
The first decimal place (tenths) indicates a minor revision that is
not essential, but which may be desired.
The second decimal place (hundredths) indicates a trivial revision
that will probably only be desired by specific individuals or by
diehard "latest version" fanatics.
ARC also displays its date and time of last edit. A change of the
date and time without a corresponding change in version number
indicates a truly trivial change, such as fixing a spelling error.
To sum up: If the units change, then you should get the newer version.
If the tenths change, then you may want to get the newer version. If
anything else changes, then you probably shouldn't bother. This is
reflected by our own habit of referring to "version 4.3" instead of
"version 4.31".
SPECIAL NOTES
=============
Whenever ARC encounters a fatal error condition it leaves the original
archive on disk, renamed to have an extension of ".BAK" (backup).
The function used to calculate the CRC check value in previous
versions has been found to be in error. It has been replaced in
version 3.0 with a proper function. ARC will still read archives
created with earlier versions of ARC, but it will report a warning
that the CRC value is in error. All archives created prior to version
3.0 should be unpacked and repacked with the latest version of ARC.
Transmitting a file with XMODEM protocol rounds the size up to the
next multiple of 128 bytes, adding garbage to the end of the file.
This used to confuse ARC, causing it to think that the end of the
archive was invalidly formatted. This has been corrected in version
3.03. Older archives may still be read, but ARC may report them to be
improperly formatted. All files can be extracted, and no data is
lost. In addition, ARC will automatically correct the problem when it
is encountered.
CHANGES IN VERSION 4
====================
ARC is adding another data compression technique in this version. We
have been looking for some technique that could improve on Huffman
squeezing in at least a few cases. So far, Lempel-Zev compression
seems to be fulfilling our fondest hopes, often acheiving compression
rates as much as 20% better than squeezing, and sometimes even better.
Huffman squeezing depends on some bytes being more "popular"
than others, taking the file as a whole. Lempel-Zev compression is
instead looking for long strings of bytes which are repeated at
various points (such as an end of line followed by spaces for
indentation). Lempel-Zev compression is therefor looking for
repetition at a more "macro" level, often acheiving impressive packing
rates.
Alas, nothing ever comes free. This gain in storage efficiency comes
at the price of processor time. ARC version 4.0 will usually take
about twice as long to add a file to an archive as version 3.1 did.
We intend to work on improving this in the future, but it will always
be slower since it must now work much harder to determine the best
packing method.
Fortunatly, file extraction is only slightly slower, to the point
where it will probably go unnoticed.
In the typical case a file is added to an archive once and then
extracted many times, so the increased time for an update should more
than pay for itself in increased disk space and reduced file
transmission time.
As usual, ARC version 4.0 is completely upward compatible. That is,
it can deal properly with any archive created by any earlier version
of ARC. It is NOT reverse compatible. Archives created by ARC 4.0
will generally not be usable by earlier versions of ARC.
CHANGES IN VERSION 4.1
======================
Version 4.1 does not contain any major changes from version 4.0.
Lempel-Zev coding has been improved somewhat by performing non-repeat
compression on the data before it is coded (as was already done with
Huffman squeezing). This has the two fold advantage of (a) reducing
to some extent the amount of data to be encoded, and (b) increasing
the time it takes for the string table to fill up. Performance gains
are small, but noticable.
The primary changes are in internal organization. ARC is now much
"cleaner" inside. In addition to the esthetic benefits to the author,
this should make life easier for the hackers out there. There is also
a slight, but not noticable, improvement in overall speed when doing
an update.
Version 4.1 is still fully upward compatible. But regretfully, it is
again not downward compatible. Version 4.1 can handle any existing
archive, but creates archives which older versions (including 4.0)
cannot unpack.
CHANGES IN VERSION 4.3
======================
Version 4.3 adds the much-demanded feature of using pathnames when
adding files to an archive. For obscure technical reasons, files
being extracted still go in the current directory on the current
drive. Pathnames are also not supported for any of the other
commands, because it would make no sense.
Version 4.3 is also using a slightly different approach when adding a
file to an archive. The end result is twofold:
1) Slightly more disk space is required on the drive containing the
archive. This should only be noticeable to those creating very
large archives on a floppy based system.
2) A 30% reduction in packing time has been achieved in most cases.
This should be noticeable to everyone.
As always, version 4.3 is still fully upwards compatible, and is
backwards compatible as far as version 4.1.
CHANGES IN VERSION 4.4
======================
The temporary file introduced in version 4.3 occasionally caused
problems for people who had not added a FILES= statement to their
CONFIG.SYS file. This has now been corrected. Also, support of the
ARCTEMP environment string was added to allow placing of the temporary
file on a ramdisk.
A bug was reported in the Run command, which has been fixed. From the
extreme time required before the bug was reported, it is deduced that
the Run command is probably the least used feature of ARC.
The Update command was changed. It is no longer a straight synonym
for Add. Instead, Update now only adds a file if it is newer than the
version already in the archive, as shown by the MS-DOS date/time
stamp.
PROGRAM HISTORY AND CREDITS
===========================
In its short life thus far, ARC has astounded us with its popularity.
We first wrote it in March of 1985 because we wanted an archive
utility that used a distributive directory approach, since this has
certain advantages over the more popular central directory approach.
We added automatic squeezing in version 2 at the prompting of a
friend. In version 2.1 we added the code to test for the best
compression method. Now (in October of 1985) we find that our humble
little program has spread across the country, and seems to have become
a new institution.
We are thankful for the support and appreciation we have received. We
hope that you find this program of use.
If we have acheived greatness, it is because we have stood upon the
shoulders of giants. Nothing is created as a thing unto itself, and
ARC is no exception. Therefore, we would like to give credit to the
following people, without whose efforts ARC could not exist:
Brian W. Kernighan and P. J. Plauger, whose book "Software Tools"
provided many of the ideas behind the distributive directory approach
used by ARC.
Dick Greenlaw, who wrote the public domain SQ and USQ programs, in
which the Huffman squeezing algorithm was first developed.
Robert J. Beilstein, who adapted SQ and USQ to Computer Innovations
C86 (the language we use), thus providing us with important parts of
our squeezing logic.
Kent Williams, who graciously allowed us to use his LZWCOM and LZWUNC
programs as a basis for our Lempel-Zev compression logic.
David Schwaderer, whose article in the April 1985 issue of PC Tech
Journal provided us with the logic for calculating the CRC 16 bit
polynomial.
And many, many others whom we could not identify.